In this tutorial we will introduce users to some basic code for making plotting and visualizing data

1 Learning goals

In this short tutorial, you will learn

  • make basic scatterplots in R
  • make boxplots in R
  • subset data when referencing a dataset (here used in plots)

set your working directory, just like before:

2 The Iris Dataset

We will plot some data using a dataset available in base R (it comes with R). These data were collected by Edgar Anderson in 1935 and used by R.A. Fisher (and many, many people since). The iris data set gives the measurements in centimeters of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris. The species are Iris setosa, versicolor, and virginica. Here’s what they look like:

2.1 Take a look at the data

names(iris)
## [1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width" 
## [5] "Species"
str(iris)
## 'data.frame':    150 obs. of  5 variables:
##  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
##  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
##  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
##  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
##  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
summary(iris)
##   Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
##  Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
##  1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
##  Median :5.800   Median :3.000   Median :4.350   Median :1.300  
##  Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
##  3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
##  Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
##        Species  
##  setosa    :50  
##  versicolor:50  
##  virginica :50  
##                 
##                 
## 

2.2 Let’s use a scatterplot to look at the relationship between petal length and width

# here's a basic plot command: plot(x, y)
# this says using the dataframe 'iris', plot 'petal width' on the x axis, and 'petal length' on the y axis
with(iris, 
     plot(Petal.Width, 
          Petal.Length))

# an alternative way to write this is to specify the dataframe iris, follwed by a dollar sign $, then the column name
plot(iris$Petal.Width, 
     iris$Petal.Length)

# I use this notation to help me keep track of multiple columns within different data frames

# another way to make the same plot is to use a tilde ~
# In R the tilde is used to separate left and right sides of a model formula
# so you could read this as: plot Petal Length as a function of Petal Width
plot(iris$Petal.Length ~ iris$Petal.Width)

# make this look nicer by adding axis labels for the x axis (xlab) and y axis (ylab), plus a title (main)
plot(iris$Petal.Length ~ iris$Petal.Width,
  xlab = "Petal Width (cm)", ylab = "Petal Length (cm)",
  main = "Data from three iris species"
)

# there is some clear separation in the points.  To see if those separate by species, make each species a separate color with col= (this only works if the variable you are using to definte the colors is categorical)
plot(iris$Petal.Length ~ iris$Petal.Width,
  xlab = "Petal Width (cm)", ylab = "Petal Length (cm)",
  main = "Data from three iris species", col = iris$Species
)

2.3 Use a boxplot to further visualize the differences between species

# the same formula can be used to make a boxplot
boxplot(iris$Petal.Length ~ iris$Species)

# then color in the boxes using col= and specifying the colors as a list
boxplot(iris$Petal.Length ~ iris$Species,
  col = c("black", "red", "green")
)

# if we have not yet mentioned it, you specify a list or vector of multiple items in R using the c() function, which stands for 'concatenate'

3 Plotting Exercise

  • Open a new R script or just add onto the chunk of code above

  • Play around with adding axis labels, title, and changing colors in the boxplot

3.1 We will go back to the scatterplot and make our plots look nicer

## let's go back to the iris data scatterplot

# use pch= to change the plotting symbol (stands for plot character)
plot(iris$Petal.Length ~ iris$Petal.Width,
  xlab = "Iris Petal Width", ylab = "Iris Petal Length",
  main = "Data from three iris species", pch = 2
) # all points will be a triangle

plot(iris$Petal.Length ~ iris$Petal.Width,
  xlab = "Iris Petal Width", ylab = "Iris Petal Length",
  main = "Data from three iris species",
  pch = c(1, 2, 18)[unclass(iris$Species)]
)

# This works by using c(1, 2, 18) to create a vector,
# unclass(iris$Species) turns the list of species from a list of categories
# (a "factor" data type in R terminology) into a list of numbers, each representing a species:
c(1, 2, 18)[unclass(iris$Species)]
##   [1]  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
##  [24]  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
##  [47]  1  1  1  1  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2
##  [70]  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2
##  [93]  2  2  2  2  2  2  2  2 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18
## [116] 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18
## [139] 18 18 18 18 18 18 18 18 18 18 18 18
# do the same thing with assigning different colors to the 3 species
plot(iris$Petal.Length ~ iris$Petal.Width,
  xlab = "Iris Petal Width", ylab = "Iris Petal Length",
  main = "Data from three iris species",
  col = c("magenta", "dark green", "blue")[unclass(iris$Species)]
)

# add a legend: specify where it will be located, assign a unique title for each of the species, and specify the colors and plot characters (or symbol)
legend("topleft",
  legend = unique(iris$Species),
  col = c("magenta", "dark green", "blue"), pch = 1
)

# alternatively, you can use x,y coordinates to place the legend
legend(1.8, 4,
  legend = unique(iris$Species),
  col = c("magenta", "dark green", "blue"), pch = 1, bty = "n"
)

# saving plots:
# in Rstudio, you can save a plot very quickly by opening the 'Export' drop-down menu in the figure window, and selecting 'Copy to Clipboard'.  I use this to quickly past figures into a Word or powerpoint document as a way of taking notes while I work.  that way I can place 2 similar graphs side by side and look closely at them, instead of clicking back and forth.

# to make a printable, high quality figure, click on 'Save as PDF' or 'Save as Image' and then specify your size, orientation, and file you save to

4 Plotting exercise 2

here’s link to more plotting options: https://www.statmethods.net/advgraphs/parameters.html

make plot of the iris data with 3 colors and 3 shapes of your choosing (one for each species, and a matching legend